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Abstract 

The design of a scheduling scheme is crucial for the efficiency and user-fairness of wireless 
networks. Assuming that the quality of all user channels is available to a central controller, a simple 
scheme which maximizes the utility function defined as the sum logarithm throughput of all users has 
been shown to guarantee proportional fairness. However, to acquire the channel quality information may 
consume substantial amount of resources. In this work, it is assumed that probing the quality of each 
user's channel takes a fraction of the coherence time, so that the amount of time for data transmission 
is reduced. The multiuser diversity gain does not always increase as the number of users increases. In 
case the statistics of the channel quality is available to the controller, the problem of sequential channel 
probing for user scheduling is formulated as an optimal stopping time problem. A joint channel probing 
and proportional fair scheduling scheme is developed. This scheme is extended to the case where the 
channel statistics are not available to the controller, in which case a joint learning, probing and scheduling 
scheme is designed by studying a generalized bandit problem. Numerical results demonstrate that the 
proposed scheduling schemes can provide significant gain over existing schemes. 

I. Introduction 

Efficient and fair scheduling is important for wireless systems with limited resources and 
heterogeneous user conditions. A large class of resource allocation schemes with fairness consid- 
erations are obtained by maximizing some utility functions of the throughput d). hi particular, 
proportional fairness is achieved when the utility is the sum of the logarithm of the users' 
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throughput. In existing third generation wireless systems, like EV-DO and HSDPA, proportional 
fair (PF) scheduling scheme is employed at the base station to schedule downlink traffic to 
mobile users. The PF scheme strikes a good balance between throughput efficiency and fairness 
by exploiting the multiuser diversity |[2l and the game-theoretic equilibrium J3]|. Analysis and 
applications on PF scheduling have been extensively explored from various aspects due to its 
favorable performance and low implementation complexity. For example, there have been studies 
of the convergence and optimality [4], stability Q, throughput [6 J and capacity region of 
PF scheduling. 

Most previous work on PF scheduling assume that the instantaneous channel quality informa- 
tion (CQI) of all users is known to the scheduler at no cost. In practice, however, acquiring the 
CQI often consumes a significant amount of resources in terms of time, bandwidth and power. It 
is important to understand the impact of the cost when the number of users is large, because the 
cost may scale linearly with the user population. The goal of this work is to answer the following 
two questions: 1) to what extent will the CQI acquisition affect the scheduling? and 2) how to 
probe and schedule the users to achieve the best performance with proportional fairness? 

There have been related works on the impact of the channel uncertainty on the communication 
systems. The loss of throughput caused by poor estimates of channel quality is quantified in [8J. 
Joint channel probing and user scheduling has also been addressed recently. Several schemes 
with the objective of maximizing the system throughput have been designed in f[9]|— [|T2]|. And the 
authors of [fT3l - [fT5l propose schemes for stabilizing the queues and characterize the network 
throughput region. In contrast to the preceding works, the goal of this paper is to design a 
proportional fair scheduling scheme which takes into account the cost of channel probing. Our 
previous work lfT6l has shown the scheme and its performance roughly. In this paper, we not 
only present the derivation of the scheme with rigorous arguments, but also show its asymptotic 
behavior and the optimality with theoretical rigor. In addition, the scheme is extended to a more 
generalized scenario. The organization and main contributions of this work are as follows: 

• Section II describes the network model. 
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• In Section III, we assume the prior distribution of CQI is known to the scheduler, and 
formulate the problem of sequentially probing user channels to make scheduling decision 
as a stopping time problem. A simple scheme based on maximizing the sum logarithm 
throughput of all users is shown to guarantee proportional fairness and convergence. The 
scheduling gain of the scheme is determined analytically. Further reduction of computational 
complexity is also discussed. 

• In Section IV, the statistics of the CQI is assumed not to be available to the scheduler. The 
problem is formulated as a generalized bandit problem, and a joint learning, probing and 
scheduling scheme is proposed. 

• In Section V, significant advantages of the proposed schemes are demonstrated using nu- 
merical experiments. In typical scenarios where the statistics of the CQI are not available, 
the joint learning, probing and scheduling scheme achieves almost the same performance 
as that in the case where the statistics are known. 

II. The Network Model 

Consider a wireless system with one controller and K users with time-varying channel quality, 
such as in the downlink of a cellular system. Let time be divided into unit-length slots and only 
one user can be served in each slot. As in most related work (e.g., H| and 0), the transmit 
power is assumed to be fixed so that dynamic power allocation is not considered. Thus the 
achievable rate is only determined by the instantaneous channel quality. Moreover, we assume 
saturated traffic for all users. 

Assume slow fading, where the duration of a slot is much shorter than the channel coherence 
time, so that the channel quality remains constant during each slot. We make the following 
homogeneous rate assumption that the rate of each user normalized by its mean value follows 
the same distribution: 

(Al) Let Xi, . . . , Xk be independent identically distributed (i.i.d.) non-negative random vari- 
ables with unit mean value. Let ri, . . . , r K > be constants. Let R k = r k X k for k — 1, . . . , K. 
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The achievable rates {R k (n)\k — 1, . . . , K; n = 1, 2, . . . } are independent. For every user k, the 
rates over the time slots, R k (l), R k (2), . . . , are i.i.d. following the same distribution as that of 
R k . Clearly, ER k (n) = r k . 

The instantaneous achievable rates of all users are not known a priori. During each slot n, 
to obtain the achievable rate R k (n) requires the scheduler to probe the channel of user k using 
a fraction (3 of the slot. Let I k (n) be an indicator of the event that user k is scheduled for 
transmission in slot n. Let J{n) denote the number of probed users in slot n. The amount of 
data transmitted to or by user k during slot n is B k {n) = (1 — J(n)(3)R k (n)I k (n), which is 
nonzero for only one user during each slot. The throughput of user k averaged over n slots is 
thus 

1 71 

T k {n) = -Y^B k (j). (1) 

3=1 

III. Joint Probing and Scheduling with Known Channel Statistics 

In this section, we consider the case where the statistics of R = [R 1 , . . . , R K ] is known to 
the scheduler and design a proportional fair scheme. 

A. The Algorithm 

Consider first a scheme which maximizes the utility defined as the sum logarithm throughput: 

K 

u(T(n)) = J2^T k (n) . (2) 

k=l 

Note that by ®, 

n — 1 1 
T k {n) = T k (n - 1) + -B k {n). (3) 
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So that the increase of the utility function after the n-th slot is 

u{T{n)) - u{T(n - 1)) 

K 

= J2(^T k (n)-lnT k (n-l)) 



k=l 
K 



E, fn — 1 1 Bu(n) 
k=1 V n nT k (n-l) 

j,Jn-l + l^m st(n ) 4( „)), ,4) 



k=l v 

where the throughput-normalized rate is 

Sk{n) = — rr. (5) 

T k (n - 1) 

Since the indicator I k (n) is zero for all but one user k in each slot, one can see that to greedily 
maximize the utility increment at time slot n, we should schedule the user with the maximum 
s k (n), which is the classical PF scheduling algorithm. 

However, due to the assumption that the instantaneous rates R k (n) are unknown a priori, 
we can only probe the users rates and obtain s k (n) one by one in each slot. We formulate the 
following optimal stopping time problem [fT8l . Note that the scheduling decision made in one 
slot has no impact on future realization of the rates, it suffices to consider one arbitrary slot 
and omit the time index n. For the scheduler, the joint probing and scheduling problem at the 
beginning of the time slot is defined by two objects: 

(i) The independent throughput-normalized rates s\, . . . , sk- 

(ii) A sequence of positive- valued reward functions yi,...,yx, where if j channels have 
been probed to reveal their throughput-normalized instantaneous rates t\,... ,tj, the reward of 
terminating the probing phase and schedule the best user found so far is 

yj(ti, ...,tj) = (l- j/3) max(ti, • • • , tj). (6) 

The theory of optimal stopping is concerned with determining the stopping time J to max- 
imize the expected reward E[i/j]. The maximum number of probings in every slot is J ma x = 
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min(K, |1//3J). Compared with the classical optimal stopping problem, the formulation above 
is more general in the sense that the probing order of s k is not deterministic. Hence the joint 
probing and scheduling scheme basically includes two tasks in each slot: to determine the order 
in which users are probed, and to select one user as the destination at a proper (stopping) 
time. Recalling the objective of maximizing the expected i/j, the user with the largest E[s fc (n)] 
should be probed first, and then the second largest and so on. From Assumption (Al), we know 
Sk( n ) — E[sfc(n)] = r k /T k (n — 1). Hence the probing order is ix{n) = (ki, ■ ■ ■ , kx) such that 
Sfci(^) > • • • > Sk K (n). Now that the probing order has been determined, the decision on when 
to stop can be addressed by investigating the structural property of the problem. 

Theorem 1: Under the homogeneous rate assumption (Al), the joint probing and scheduling 
problem is a monotone stopping problem lfT8l Chapter 5], which means that, if £j denotes the 
event 

{Vj{s kl ,--- ,s k .) > E[y j+1 (s kl ,- ■ ■ ,s kj+1 )\s kl , ■ ■ ■ ,s kj ]} , (7) 

then £j C £ j+1 for < j < J max — 1. 
Proof: See appendix lAl 

Now the problem has been proved to be monotone, then from the lTT8l Theorem 1, Chapter 5], 
the one-state look-ahead rule is optimal. The one-stage look-ahead rule is the one that stops if 
the reward for stopping at current stage is at least as large as the expected reward of continuing 
one stage and then stop. Mathematically, the rule is described by the stopping time. Let Wj 
denote the largest value of the observed throughput-normalized rate after probing j users and 
a V b = max(a, b), the optimal stopping time is 



J* = min \j > : (1 - j0) Wj > (1 - (j + 1)/3)E 



wj V 



(8) 



T kj+1 (n - 1) 

which solves the stopping problem almost surely in each slot. Precisely, the optimal PF joint 
probing and scheduling (JPS-PF) scheme is described as Algorithm 1. 

B. On the Optimality of Algorithm 1 

To present the optimality of Algorithm 1, we need to show the convergence property. 
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Algorithm 1: JPS-PF 



1 Initialization: T k (0) <— 1 for k 

2 for n = 1, 2, • • • do 



K: 



4 

5 
6 
7 
8 
9 
10 

ii end 



Sfc(n) r k /T k {n — 1). Sort the throughput-normalized mean rate s k (n){k = 1, 
in the descending order: s kl (n) > • • • > Sk K (n) ; 
j <- 0, w <- ; 



do 



i <- i + 1 ; 

Probe user kj and get the rate Rk (n) ; 

w <r- w V R k .(n)/T k (n - 1) ; 



while (1 - < (1 - (j + 1)/3)E 
Transmit to user kj. Update T(n) ; 



w V 



Theorem 2: Assume (Al). Then for any initial condition, the throughput sequence T(n) 
generated under Algorithm 1 converges almost surely to the limit point T* of the ordinary 
differential equation f(t) = h(T(t)), where h(T) = —T + E[B(n)\T(n — 1) = T]. Moreover, 
all users' steady-state throughput are proportional to their mean rate with an identical ratio k, 

S = S = ... = S = (t (9 ) 

Proo/: Let M(n) = B{n) - E[B(n)\T(n - 1)]. By ©, the update of users' throughput 
can be organized in the form of stochastic approximation iteration [19, Eqn. 2.1.1]: 

T{n) = T{n - 1) + a(n)[h(T(n - 1)) + M(n)], 

where a{n) = 1/n. The equation above is a standard stochastic approximation expression. It 
is easy to verify that h(-) is Lipshitz, the stepsize satisfies J2 n a ( n ) = °°^J2n a ( n ) 2 < 00 anc ^ 
T(n) is bounded. Furthermore, it is easy to verify that K[M(n)\M(l), ■ ■ ■ ,M(n — 1)] = 0, 
so M(n) is a martingale difference sequence. Now the throughput update under the proposed 
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scheme satisfies the assumptions (A1)-(A4) in [19, Section 2.1], then applying Theorem 2 in 
[fT9l Section 2.1] directly, the convergence conclusion holds. 

Now the convergence of the throughput sequence has been obtained. The remainder of the 
proof is by contradiction. Suppose © does not hold at steady state and that T^/r 1 < T 2 */r 2 
without loss of generality. Consider the throughput path starting at slot n which is at steady 
state. At this time, S[ = ri/T*{l = 1,2) and Si > s 2 . Thus user 1 is probed first in each slot. 
From assumption (Al) we know that si and s 2 are of the same type of distribution, but si has a 
larger mean value. Thus user 1 is selected for transmission more often than user 2, which would 
further imply Ti(n + n\)/r\ > T 2 (n + ni)/r 2 after a sufficiently large number (ni) of slots, 
which contradicts the steady state assumption with T£/t\ < T 2 /r 2 . ■ 

Note that the constant proportionality factor k is a bridge connecting the steady-state through- 
put and the mean-rate. After obtaining k, it is straightforward to evaluate the throughput and 
utility. On the other hand, due to the fact that k is a constant, we have the following corollary 
from the proof of Theorem [2l 

Corollary 1: Under Algorithm 1, the probability that each user is selected as the destination 
is identical as l/K. 

Algorithm 1 is asymptotically optimal in the following sense: 

Theorem 3: Assume (Al). Then T* maximizes the PF utility u(-) over the rate region gen- 
erated by all joint probing and scheduling schemes. 

Proof: Let S denote the set composed of all the feasible schemes T under the assumption 
that only one user can be selected in one slot. The developed scheme in this paper is denoted as 
r*. We have shown in the derivation of Algorithm 1 that T* is optimal for solving the monotone 
stopping problem in each slot, that is, it maximizes B k (n)/T k (n — 1) in slot n almost surely. Due 
to the constraint that only one user can be scheduled in one slot, we can see that the developed 
scheme Y* satisfies 




(10) 
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where B\, '(n) is the number of bits transmitted to user k in slot n under the scheme T. Recalling 
the definition of the utility function in ©, it can be found that 



which means that the scheme chooses a decision maximizing the scalar product of B^(n) and 
the gradient Vu{T{n - 1)). 

The gradient scheduling algorithm developed by Stolyar ifTTll is that, at time n the controller 
chooses a decision V{n) E argmax Vw(T(n — 1)) • B^ r \n). Let T denote the solution to the 
problem 



where V is the system rate region, i.e., the set of all feasible long-term service rate vectors. Then 
the ifTTl Theorem 2] shows that the expected average service rates under the gradient scheduling 
algorithm converges in probability to T. 

By (flOl) and (flTT) . one can see that the joint probing and scheduling algorithm in this paper 
belongs to the gradient scheduling algorithm. From the convergence of Algorithm 1, we know 
T* = T. Then the achieved throughput T* maximizes the PF utility function asymptotically. ■ 

C. A Static Threshold Criteria 

Note that in Algorithm 1 , after each probe, the scheduler needs to evaluate the expectation in 
([8]) which depends on the channel realizations. Further reduction in the computational complexity 
is possible by simply comparing the highest normalized rate against a sequence of deterministic 
thresholds, in lieu of computing ®. Consider the steady-state case where users' throughput is 
exactly T*. Note that by Theorem H 




(11) 



max 



u{T) 



s.t. 



T e V, 



Rk Rk 



T kj+1 (n-l) T fc *. 
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which is identically distributed as X\/k. For < j < J max — 1, the inequality of Wj in © 
reduces to 

(1 - jP)wj > (1 - {j + l)P)E[max( Wj , K- 1 X 1 )\w j ]. (12) 



It turns out that (PT21) can be reduced to comparing KWj with a static threshold Vj, which can 
be determined as follows. Let F x (-) denote the cumulative distribution function (CDF) of X k . 
Then 

X 1 



E 



max | Wj, 



K 

So that (PT2l) can be rewritten as 



00 'X 



Wj + / (- - Wj) dF x (x). (13) 











Wj -+ 







(14) 



(l-j/3H>(l-(j + l)/3) 
or, equivalently, 

ft^j > 9j( KW j)i (15) 

where 

/■oo 

= - (j + 1)] (x- v)dF x (x). (16) 

J V 

It is not hard to check that: (i) gj(v) > for v > 0; (ii) ^ (f ) is a strictly decreasing function of 
v; (iii) lim^oo ^-(f ) = 0. Then inequality (TT5T) is equivalent to > t> J5 where Vj is the cross 
point of function f(v) = v and gj(v). Also, we have gj(v) > gj + i(v). Then it is easy to verify 
that Vj + i < Vj. The solution to (fT5l) is illustrated in Fig. [2l 

By observing the structure of (fT6l) . it is worth pointing out that the cross point Vj is only 
determined by j, (3 and the CDF F x ( ), i.e., the unit mean valued random variable Xj. And 
the value of Vj is independent of the number of users K, the mean rates of all users r k as well 
as the achieved throughput to mean-rate ratio k. Hence if the transmitter knows the distribution 
F x (-) , it can compute Vj in advance. 

Now inequality CLU) can be expressed as Wj > -Vj for < j < J max — 1. which is also 
equivalent to the inequality in ([8]) in the steady-state case. Thus the decision on whether to keep 
probing or to start transmitting is decided by a static threshold criteria. For completeness, let 
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vj max = in order to make sure the probing can always be terminated in each slot. We get the 
following static threshold based probing criteria, which can replace the line 9 in Algorithm 1. 

Criteria 1: After probing j users, if the current value of the largest normalized rate wj > -Vj, 
then the transmitter transmits to the user with the largest normalized rate; otherwise it probes 
the (j + l)st user. 

In practice, the scheduler can calculate Vj in advance but k is unavailable at the beginning. 
One way to estimate n is to start the joint probing and scheduling using the dynamic criteria in 
line 9 of Algorithm 1 . After a period of time, the throughput approaches to its steady-state value. 
Then the throughput to mean-rate ratio k is obtained and the static threshold criteria can be used 
thereafter. Alternatively, k can be determined theoretically as discussed in the next subsection. 

D. The Scheduling Gain 

In this section we analyze the performance of the proposed scheme theoretically. We define 
the scheduling gain as the ratio of the achieved throughput to that using round robin scheduling 
without probing, which reflects how much multiuser diversity benefits can be exploited. The 

rp* 

scheduling gain of the proposed joint probing and scheduling scheme is K -\ rk = kK. For a 
random variable X, let us denote the truncation of X over [a, b] as [X] b a . Note that E[X|a < 
X < b] = E[X} b a . 

Theorem 4: Under the homogeneous rate assumption (Al), the scheduling gain of Algorithm 
1 is 

Jmax 

3=1 

where Vj is the solution of v — gj(v). 

Recall that J* is the optimal stopping time, that is, the number of users probed before a user 
is scheduled. We prove Theorem @] using the following supporting lemma. 

Lemma 1: Using Algorithm 1, the steady-state probability of the event that j users are probed 
until transmission is given by 

Pi = (Fxivj^y- 1 - (F x ( Vj )y, i < j < j max . (i7) 
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Proof: At steady state, all users' throughput-normalized mean rates r k /T£ are essentially 
identical. Let qj = Pr{J* > j}, i.e., the probability that at least j users are probed before 
transmission. Then qi = 1. And from Criteria 1, we have for j > 2, 

qj = Pr{max(Xi, ■ • • , Xj-i) < u,_i} 
= Pr{X x < v^x} ■ ■ ■ PrjX^x < Vj ^} 

Like Vj, qj is also completely determined by the rate distribution. Clearly, pj = qj — for 
j < Jmax ~ 1 and p Jmax = q Jmax . ■ 
Proof of Theorem @- Consider a specific user k. In the steady state, T(t) = 0. Then from 
Theorem [2l user fc's throughput is given by = E[Sfc(n)|T*]. Throughout, let K* denote 
index of the user that is selected as destination. Then event {K* = k}, i.e., user k is selected as 
destination, can be decomposed into J max exclusive sub events: {K* = k} = [j {K* = 
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k, J* = j}. Then we have 

T* =E[B k (n)\T*] = E[(l - J*P)R k I k ] 
=Pr{K* = fc}E[(l - J*/3)R k \K* = k) 
{ ^^-E[{l -J*/3)R k \K* = k] 

K 



X 



Jmax 



- ^V* = JW - jP)R k \K* = k, J* 
3=1 

Jmax 



j] 



T, 



^$>,(1-j73)E 

3=1 

Jmax 

£; Pi (i-j/3)i 



l 1 k 



K* 



k,J* 



J 



3=1 

Jmax 



kK 



3=1 

Jmax 



max ( 




"i-i 






) ' ' " ) 






\ Xl ' 




max ^ 


) ' ' ' ) 




K 






1 i-i 



Xj-x 



K 



Rj 

3 



V J-1 



-, oo 



£ Pi(l - J/3)E { [max ■ ■ • , [X^]^ 1 , X,)] 



3=1 



where (a) follows from Corollary [H (b) from the law of total probability, (c) from the static 
threshold criteria, that is, {K* = k, J* = j} means that: i) user k has the largest throughput- 
normalized rate among the first j users; ii) the first j — 1 users' throughput-normalized rates 
are smaller than kT 1 v j-i and iii) the largest value of the first j users' throughput-normalized 
rates is larger than KT x Vj, (d) from R k = r k X k and ©, and (e) from the distribution of Xj. By 
replacing pj with (fTTT ) and removing T| from both sides, the conclusion of Theorem |4] holds. ■ 

IV. Joint Learning, Probing and Scheduling 

Consider the case where the scheduler does not know a priori the statistics of the quality of 
the downlink channels, and thus has to rely on the history of the probed CQI to decide on the 
user probing order and user selection. Under this assumption, the problem of maximizing the PF 
utility function is a generalization of the classical multiarmed bandit problem 112011 . The problem 
is a generalization because in the classical bandit problem, the decision maker has to decide 
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which of K random process to observe in a sequential of trials so as to maximize the reward, 
where the 'observing' operation is equivalent to the 'utilizing' operation. However, in our model, 
in each slot, the scheduler may probe (observe) more than one channels (random processes) and 
then choose only one for transmission (utilization). The observation does not always lead to a 
utilization. 

At the beginning of slot n, i.e., the end of slot n — 1, let M k {n — 1) denote the number of time 
slots in which the channel to user k has been probed, and TZ k (n - 1) = {r£\ • • • , R^^-^j 
record all the probed samples of the channel rate of user k. Clearly, the cardinality \lZf.(n — 1)| = 
Mk(n — 1). The scheduler keeps updating the K sets [lZi(n), ■ ■ ■ , IZxin)} from slot to slot. Also, 
the scheduler knows the throughput T(n — 1) till the previous slot. The objective is still to find 
a scheme that solves the stopping problem in each slot. As analyzed in Section IIII-Al there still 
exists the same two tasks to find the optimal scheme: determining the user probing order and 
selecting one user for transmission. Hence the problem formulation and scheme design is similar 
to those in Section [TlI-AL The only difference is that the scheduler just has the sampled values of 
all channels' rates instead of the explicit knowledge of the distribution of R k ,(k = 1, • • • ,K), 
which means that we cannot calculate the expectations related to R k directly. Alternatively, we 
can only evaluate the empirical average using the acquired samples of R k , which readily leads 
to the index-based policy solution in the framework of bandit problem. 

The index policy, consisting of choosing at any time the stochastic process with the currently 
highest index, is the solution to a class of bandit problems. Here to find the optimal scheme, 
we adopt the similar methodology as in the development of the index-based policy by Agrawal 
in [EH. For the decision on the user probing order, we use the current average reward, i.e., the 
throughput-normalized average rate as the index. For the decision on when to start transmission, 
we adopt the actually served bits in current slot, i.e., the product of 1 — j/3 and the conditional 
throughput-normalized-average rate. For the convenience of presenting the algorithm, we define 
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the following two empirical averages 



s k (n) 



1 



M k (n-1) 



e k {n,w) 



M k (n 
1 



1? 



(m) 



m 

M fe (n-1) 

£ 

m=l 



\ r fc (n-l) 



w V 



1? 



.(m) 



(18) 
(19) 



w V 



T fc (n-1) 



in 



Affc(n-l) 

The Sfc(n) is used to replace the Sfc(n) in Algorithm \T\ and the efc(n, u>) is for E 
Algorithm Q] Then a joint PF learning, probing and scheduling (JLPS-PF) algorithm is described 
in Algorithm 2. 

Algorithm 2: JLPS-PF 

1 Initialization: n \/3K~\ . For k = 1, - ■ ■ , K, T k (ri) <— 1. In the first n slots, sequentially 
probe each channel once, making sure that each one of the sets lZ k (n), (k = 1, • • • , K) is 
not empty. M k [n) <— 1 ; 

2 for n = \/3K] + 1, + 2, • • • do 



10 

n 

12 
13 

14 end 



Skin) <- 



i 



Af fc (n-1) 



R k /T k (n — 1). Sort s k (n)(k = 1, ■ ■ • , K) in the descending 



M fc (n-1) 



m=l 



order: s fcl (n) > • • • > s kli (n) ; 

j ^— 0, w «- ; 

do 

j<-j + l', 

Probe user kj and get the rate R k . (n) ; 

<- w V R k (n)/T k (n - 1) ; 



efc J+1 (n,w) <- 



J+ L m=l 



K 



w V 



(m) 
k j + l 



T k]+1 (n-l) 



Tl kj {n) <r- K kj {n - 1) U {R kj {n)}, M kj {n) <- M kj (n - 1) + 1 ; 
while [l-jp)w < (1 - (j + l)/3)e kj+1 {n,w); 
Transmit to user kj. Update T(n) ; 

For k = kj + 1, ■ ■ ■ , k K , TZ k {n) <- K k {n - 1), M k {n) <- M k [n - 1) ; 



From the description of Algorithm 2, one may wonder such a phenomenon may exist that if 
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one user is probed with relatively high values in the first few slots, then it will have low priority 
of being probed afterwards, resulting that the ensemble average of this channel is always higher 
than its statistical expectation. However, this does not happen thanks to the structure of the 
algorithm derived from the objective of maximizing the PF utility. As a matter of fact, if user k 
is probed and selected less frequently compared to other users, the achieved throughput T k {n) 
will become small, which will in return increase its priority of being probed and selected. In 
fact, the metric of throughput-normalized rate used in PF scheduling is a well-balanced rule that 
guarantees each user is sampled with sufficiently many times and identical frequencies. Hence 
after the Algorithm 2 runs a a sufficiently long time, the sampled data of each user's channel 
rate can characterize the statistics of R well. Then from the law of large number, the ensemble 
average converges to the statistical expectation. And the performance of Algorithm 2 is almost 
the same as that of Algorithm 1. 

V. Numerical Results 

In this section, we provide some numerical experiments illustrating the theoretical findings of 
the previous sections. Our objectives here are (i) to evaluate the performance of the developed 
schemes with and without channel statistics; (ii) to compare the developed scheme for achieving 
PF with some ideal and practical schemes and to quantify the impact of the cost of CQI on the 
scheduling. We consider the scenario where users' rates obey the exponential distributions with 
average equal to the user index. The exponential rate assumption is an appropriate approximation 
of the Shannon capacity under Rayleigh fading channels in low SNR regime. 

A. Evaluation of the Proposed Algorithms 

Consider K = 20 users and let the fraction of one probe be f3 = 0.1. Up to J max = 10 users 
can be probed in each slot. 

Fig. [3] presents a sample throughput trajectory of user 1 when scheduled with Algorithm 1, the 
static threshold criteria given in criteria 1 and Algorithm 2. The simulation runs for 10, 000 slots 
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in this experiment. The time axis is in logarithmic scale to highlight the transient behavior. We 
can see that the static threshold criteria works well. The variation of the throughput diminishes 
over time as more and more time slots are included in the averaging. It is worth noting that the 
low complexity of the static threshold criteria for solving the optimal stopping problem comes 
from the explicit knowledge of the channel statistics. If this information is not known, or if the 
distribution of the channel rate varies over time, we can only adopt the dynamic criteria given 
in Algorithm 1. 

Fig. H] illustrates the frequency of each user being scheduled in a relatively short period of 
2000 slots. Each of the 20 user is selected as the destination for roughly 100 slots. That is, the 
scheme is fair to all users even within a small application time window. 

Fig. [5] presents the probability that k users have been probed until transmission. The theoretical 
results are from Lemma [Q The figure shows that both the Algorithm 1 and Algorithm 2 coincide 
with the theoretical results. We observe from the figure that the probability decreases sharply as 
the probing step approaches J max . 

Fig. [6] plots the scheduling gain of the proposed algorithms versus the number of users in the 
system. The simulation runs for 20,000 slots. In fact the simulation result matches the analytical 
result of Theorem 4 quite well. Also, we note the scheduling gain remains about the same for 
more than 9 users. Because at this time, the cost of user probing is dominant and the scheme 
always tries to carry out the user probing till the end. 

B. Comparison between the Proposed Scheme and Other Schemes 

The fraction of slot for probing one user is still set j3 = 0.1. Here four schemes are considered: 
(a) the proposed joint probing and scheduling scheme; (b) Round robin scheduling; (c) Genie- 
aided PF (GA-PF) scheme where full CQI is available to the scheduler at the beginning of each 
slot; (d) Probe-all PF (PA-PF) scheme where the transmitter probes all users before scheduling. 
For both (c) and (d), the transmitter selects the user with the largest Rk(n)/T k (n — 1) for 



transmission. From [221 we know that the scheduling gain of GA-PF is E 



max Xu 
k=l,-,K 



. Then 
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that of PA-PF is max(l - K/3, 0)E ^ max^X fc . 

Fig. [7] presents the scheduling gain of schemes (a)-(d) as a function of the number of users. 
We can see from Fig. [7J that when probing cost is taken into account, the scheduling gain does 
not always increase but approaches to a limit value as the number of users increases. This 
indicates that, by ignoring the cost of channel probing, the ideal genie-aided PF does not reflect 
the correct multiuser diversity characteristics. The comparison also shows the advantage of the 
proposed joint probing and scheduling scheme. For the probe-all PF scheme, it achieves higher 
gain than round robin when the user population is not very large compared with However, 
when the number of user increases to some extent, the scheduling gain of probe-all algorithm 
vanishes. That is because almost all the period of one slot is used for user-probing instead of 
data transmission. 

Fig. [8] displays the sum throughput of all schemes as the number of users increases. One 
can see that there exists a relative large gap between the ideal genie-aided PF curve and the 
proposed scheme. The gap quantifies the the extent to which the user probing decreases the 
system performance. For example, when the number of users is K = 20, the throughput of the 
joint probing and scheduling scheme only accounts for 55.64% of that of the genie-aided PF. 
And the throughput achieved by the joint scheme is the highest among all the non-ideal schemes 
(a), (b) and (d). The probe-all PF scheme performs similar to the joint probing and scheduling 
scheme when there are not many users (K < 6), but degrades fast and even vanishes when the 
number of users becomes large. 

VI. Conclusion 

We have studied the problem of achieving proportional fairness in wireless systems when 
explicitly taking into account the channel probing cost. An optimal adaptive joint probing and 
scheduling scheme is presented, as well as a static threshold based criteria for determining 
whether to probe or to transmit. Using the steady-state analysis, we have evaluated the scheduling 
gain explicitly. Extension of the scheme to the case in which the scheduler has no knowledge of 



September 15, 2010 



DRAFT 



19 

the channel rate distribution has been developed, which achieves almost the same performance of 
the algorithm obtained under known rate statistics assumption and outperforms other non-ideal 
PF schemes. In this work, we have focused on the well-studied proportional fairness rule. It is 
possible to extend the results to more general utilities, for example, the a fair utility [7]. The 
methodology presented in this paper can then be carried through to that case as well. 

Appendix A 
Proof of Theorem CD 

Proof: Let the largest throughput-normalized user rate after probing j users be denoted by 

Wj = max sun (20) 

1<1<3 

Then the current reward can be written as yj(s kl , ■ • ■ , s^.) = (1 —j(3)wj and the expected reward 
obtained from probing the next user is 

E[y j+1 (s kl , ■ ■ ■ ,s k . +1 )\s kl , • • • ,s kj ] = (1 - (j + l)p)E[wj V s kj+l \wj]. (21) 

Then the event Sj can be expressed as 

£ j = {(l-j/3) Wj > (1 - (j + l)/3)E[ Wj V 8 k . +l \ Wj ]}. (22) 

We first show that there exists a threshold wf such that the event £j can be represented as 

g. = { Wj > w f h) }. To this end, let fj(w) = (1 - j(5)w - (1 - (j + l)/3)E[w V s kj+l ]. Then 
w E £j fj{w) > 0. It is easy to verify that fj(0) < and fj(oo) > 0. The function fj(w) 
can be reorganized as fj(w) = /3E[w V s k . +1 ] + (1 — j/3)E[w - wV s k . +1 \. For any w' > w > 0, 

/iK) - fj{w) = PE[w' V s kj+1 -wV s kj+1 ] + (1 - j/3)E[w' -w + w'V s kj+1 - w V s kj+1 }. 

Note that w' V s kj+1 >wV s kj+1 and io' - w > w' V s kj+1 -toV Thus /j(w') - > 0, 

that is, fj(w) is a nondecreasing function. Summarizing the properties of fj(w), it can be seen 
that the solution to fj(w) > can be expressed as w > wf h \ 
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We next show that wf^\ < w^ th \ For fixed w, 
fj+i(w) - fj(w) 

=(1 - (j + l)P)w - (1 - (j + 2)P)E Skj+2 [w V s kj+2 ] - (1 - jflw + (1 - (j + l)/9)E[w V s kj+1 ] 

=£ E «* j+a [«; V s fe . +2 -w] + (l- {j + i)P){E[«> V s kj+1 ] - E Sk . +2 [w V s kj+2 }} 

>0. (23) 

where the last '>' follows from the fact that s&. +1 and Sfe. +2 are of the same type of distribution 
and Esfc. +1 > Es kj+2 . Note that is the zero point of the function fj(w). Hence wj+\ < wf h \ 
as illustrated in Fig. [TJ 

Collecting the preceding results, we have £j = {wj > w^} C {wj + i > wf^} C > 
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Fig. 1: Illustration of the property of function fj(w). 
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Time slot 



Fig. 3: The throughput trajectory of user 1 when scheduled with Algorithm 1, the static threshold 
criteria and Algorithm 2 respectively. N slot = 10, 000, K = 20, f3 = 0.1. 
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Fig. 4: The number of slots in which each user is selected as the destination. N s i ot 
20,(3 = 0.1. 
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Fig. 5: The probability that k users have been probed until transmission. K = 20, /3 = 0.1. 
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Fig. 6: The scheduling gain comparison between Algorithm 1, Algorithm 2 and theoretical results. 
p = 0.1. 
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Fig. 7: Scheduling gain VS number of users. (3 = 0.1. 
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Fig. 8: Sum throughput VS number of users. j3 = 0.1. 
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